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Abstract 



In CNS*02, we hypothetically provided the cross-supramodal integration system, 
which IS hypotheticaUy dedicated by a bidirectional neuronal network wheie pre- 
fronts cortex (PFC) is interacted with hippocampns (HC) in order to calculate 
the coherent relationship between the supramodalities. The parameter to leam the 
coherence may also be dedicated by Anterior Oingulate (AC) cortex. In this paper 
we attempt to propose top-down attention control system based on the outcome 
that we presented at CNS*02. That is, PPG is presumably interacted with supe- 
rior temporal (ST) neuron m order to extract indicator component representing the 
whole view of face or object, by maximizmg the expectant value where attention 
modulation can be taken into account of distmguishing different faces. As a result 
we postulate an objective of the top^iown attention control is essentially to comput^ 
the abstraction of fece/object information based on the Expectation Maximization 
(EM) al©>nthm since the voluntary movements of facial viewpomts must plaw an 
unportant role of integrating spatial and temporal property. 

Key words: Prefrontal and Superior-temporal CJortex, Distributed Cortical 
Neuronal Network, Spatwtemporal Attention, Cross Suprarmodality, EM 
Algorithm 



1 Introduction 



PF cortex is involved in a broad array of cognitive functions, including learn- 
ing, memory, attention, executive function, planning, and judgment (!]. It is 
known that PFC has also the executive committee by consolidating hippocam- 
pus and other cortical areas. In CNS*01, we suggested the computational 
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»as to compute the spatiolemMr^ mecutive neiroiwj network 

to leani the coherent relationshln spatiotemporal attention 

BVon. re«n. neurobLC^^^htdSrST^"' 

nism can be induced bv the inhiWt/^J, maicated that the rewmng mecha- 

xons of cerebral cort^'^vS^^S ^^^^^^^ ^'^^^ «^°*^toiy neu- 
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may expertise the -pS^ ^tttJl^l^i^^ ^^^^^^ 
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tion would as ^^^^c^Z^f^'^ representations of spatial informa- 
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STS network related to t^Z^^^.^t^^'l^^'^^ *h« PFC 

..^..toextracttheindic^Srt^^^^^^ 
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Pig. 1. HypotheUcal connection of PPC and STS 
2 Computational Model 

system is shown Pig.2 (my^h^ tt TT^' ^^'^'^^'^ P«>Posed 
devdoped by [5] ishiertcwS yTeS!^i^^^ ^^^^^^ i^^ially 

ficrs using Support Vector SieX^t^^^' T^*'?^ 
pattem recognition for a two-S t^?^ ^'^^''^^ 

peiplane that has masdmum di^^^lv T ''^^ ^«*«™>ning the separating hy- 
The dosest points. th^^'S^e ttX^r *^ set. 
Margin respectively [61. In our framZ^r*? ^ Sxippoxt Vector (SV) and 
the indicator faciaJ^coCort^t w'it S^'^ ^ ^"^ ^« *° 
to distinguish different faces. Facf cl"^oni?l^L^^^ informative source 
to detect faces of different sizes andT^r ^^^^^ algorithm aims 

ixnage [4]. More preds^ 1 (m^^Tj ^f^T ^ ^ i^P"t 
component classifiers iiid^eSLly Sif '^^^^^"g ^^ere 

level. This classifier allow??hrcLtnr^. *he first 

ey«, nose and mouth. Or4elJc3^?h ^ ^'^"d 
sifier performs the final facTdeSLn fT^^' ^ configuration clas- 

component classifiers. wfeveSJv ^f^"^^ '^"'^^S of the 

dassifierindicatesifthereisT?S^S?tte"lT^^^ ^^"^P-^'* 
the main problems in the componeSLd oS.. °' ^f 
of the components; how to find Sfa^mwl^ recognition is the selection 
tinguish a particuli object fiSm r^?o X'T'''''"*' *^ 
proposed system shown in Pig 2^J toiH we employ the 

taken inW account of maxin^^ngThrLntSf f ^ "^"^ be 

of faces are fluctuated by attSf °" "^^^ ''^^^^ viewpoint 

abstraction of indicator impon^nt ^ differentiate the 

semantic 

Let the expectation value £ be 
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Fig. 2. Proposed multiple classifier system to attain tr,A;^* 

coBventional twcMevd single cl^ ^Z^^^ coinponent(Left) and 

each SVM con^poneTt Si shl^^^i^a ^^^^^ 

late the probability density fancti^^n o J ^' ^'^^^'I'^^V' ^ calcu- 

n^appin, n:oJ^. l^^" ^fbe'^J^th'lS^^^rr ^ 
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framework the attention D ?T Importantly, in our 
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ntrilis^^TrtttrrL^dW ^"^^ ^^^^ -™Po- 

the E-step by eXtTn^T^l^ V ^^*«on value 6 is calculated in 

accordingly obtain the new Simate , ' ^ ^^"^^ viewpoint) to 
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To implement our indicator component-based approach, training images are 
ST" Ta^' T^"^'^"^ aiuminTtions^d the nSi^ue (SaS) 

bad^^ound Aft«r the images are collected, pixel values are used as input^ to 
each layer of a SVM component classifier as shown in Fig.2. TheXtS 
nnage is then converted into gray values and is re^scaled to 40 x 40 pS 
ftstogram equalization is also applied to remove variations existing in i^age 
brightness and contrast. The 1.600 gray values of each face imagf LTS 
normalized to the range between 0 and 1. Each image is represfntTd W 
^^e feature vector of length 1,600 - the total number of pixeS in the imL 
These feature vectors serve as the inputs to the SVM face dassifier dnrigTe 
t^mg process. With respect to the training dataset it includ ^974^^^ 

S Ji^} ^l?^^^^ ^^"^ viewpoints has the rotation by right to 
left, or Irft to nght withm -10° to +10». By contrast, the rotation of fadS 
SZX *° -^^2- in Fig.3 (righ;). Purthero^^ S 

^ows the «cpectation j^^es in botJi cases, and which axe calculated baLed on 
Eq.jl). Conclusively, their results demonstrate that the expectation value fa 

Z^^^^ T- *«^'«°V^al viewpoints. We suggest the computation of 

expertation va^ue where attention class Q is modulated over different^no! 
nent features, in order to restrict the quantity and quaUty ^riLg datt 
mappmg into the feature space in order to find the optimal subsrt oScted 
features, as the indicator component. selected 
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Pig. 3. F^al movement pattern by -lO" to +10" (Left) and +12" to +42» (Right) 



3 Discussioii and Future work 



We present the proposed learning system as it relates to component-based 
face recogmtion where top-down attention control can be taken i^to ^oS 
of faaal movements, which are dedicated the possibility that is^S^^ 
dependent' leaniingwitii the intrinsic nature of fadalper^eption A^^^^^ 
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Fig. 4. HBtogram shows the leaxBing result by SVM. Vertical axis denotes th^ 
margin whde homontal axis represents the viewpoint discretiSd W e^ T ffifi 
Jows the most distinction for the left-side nZconip^enTSft) 
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we assume faces may engender both face-specific and general object processing 

that face-specific processing might be revealed only if the 
^J"^ """T^ ^'^j^^* processing/niat is.^^p^ bS 

beco^P H^'^ ?'?? recognition system is itself not monoimc^TZy 
be ome differentia ed by experience. It hypotiietically allows us to pr^deX 
neuron^ computation across PFC and STS/IT. Prior to the meSSi of 
human fece recognition, many studies so far indicated that it is^t toSL'c 
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nature Itself rather than the experience-dependency. However, several lines 
of evidence make this conclusion unlikely based on infant experience of face 
proce^ing [9](10](111[121. F\irthennore, the development of face processing ha^ 
indicated ite significance for social cognition. For example, infants ori^tate 
more rapidly to peripheral visual targets when cued by the direction of eye 
gaae presented by face [13]. Semantic priming refers to the fact that recognition 
of a word or object of a particular category e.g., animals, is better and faster 
when preceded by a stimulus of the same category e.g., cat proceeded by dog 
tiian when preceded by a stimulus of a different category e.g., cat preceded 
by pencil Behavioral studies of face priming indeed have been carried out 
to infer the processes involved in face recognition. We showed the PFC-STS 
modd suggests the EM algorithm that is basically originated from top-down 
attention control process biased by visual motion. In our framework, the top- 
down attention regulates ea;ch SVM component classifier by calculating the 
etpectant value to extract indicator component. P\,ture work may aUow us 
the top-down attention to be applying m emotional perception and even in 
more gen^al-dom^n for object perception by extending the class of indicator 
component whidi is obtained from rotated faces. 
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Abstract 

leaedpojmm the face, m direaion ofgmwing U de- 
''''''"gesacwssdiffemtindiridttahandvieyi's. 

1. Introduction 

fa romponent-based face recognition tl,e classification is 
based on local components as oppo^d to U,e global ai^^ 
proachwherethewl,olefe^^ 

stfie. TTie main idea behind using components isTo com- 
Wforposechangesbyallowingaflexible geometrical 

a JnJS't't"^"'"'-''"^"^ '^^ '^^ robS 

against local changes in the image pattern caused by par 

fomed by mdependently matching templales of the e^ 
U,e nose and the mouth. A similar apprLh with an aJd 
tional alignment stage was proposed in [3]. In [7] r^ 

computed on the nodes of a 2D elastic graph. A comS 
son between global and componem-baseTface XSn 

The main difficulty in component-based object lecogni- 
non s Ae selection of the components. I.e. how to M 
discnmmatory components that allow to distingSIh a 2 

^Tet ?""?^'^'""''" ,ean,„ectangular facial 
ponenis for a face detection system. The algorithm sta^ 
h small initial components located amunKreS 
poimsontheface.Eachconvonentisg™wn tt;^^^^ 
of STLf ^"f ' -r ""'^ error b^und 

urn m«hod o the mulunrlass problem of face lecognlUon 

If« onw'h mT """"^ "««-valldalion* ^ 
Not only should the cross-validaUon error give us a be^; 



estimate of the prediction error ii ahso makes the lechniaue 
applicable to other types of classifiers besides SvSX 
2fir?" r" Pose- 

and poses This information might be relevant for a wide 
vanety of face recognition and mificatlon system 
"nieoutlineofthepaperisasfollows: The imase data is 

lal results and the discussion. Sections concludes the paper. 




Figure 1: Original images and synthetic images for all six 
«.J,ectsin Jedatabas^ TT«ftceimages a^a^lSi- 
corfmg oa,elDnumbers.star,i„g withperson 1 inSu^. 
peMefl comer and ending with person 6 in the lower right 



2. Face Data 

ofconespondmg components from a large number of Irain- 
uig m««es. To be able 10 automate the extraction ptSSs 
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we used a sci of icxiuitjd 3D head models wUh known point- 
wisccoiTCspondcnces. The 3D head models wcrc computed 
from imiige triplets (from, half profile, profile view) of six 
subjects using the morphablc model approach described in 
W- Fig. I .shows an original image and a rendered images 

?n oS '"''■''^ ^ Approximately 

10.900 synthetic faces were gcneratefl at a resolution of 
58 X 08 by rendering the six 3D face models under vary- 
ing pose and illumination. The faces were rotated in depth 
fromO to 4.1' in 2» incremeni.s. They were illuminated by 
ambient light anda single directional light pointing towawis 
the center of the face. TT« directional light source xvas po- 
sitioned between -90« and 90° in azimuth and 0" and 75« . 
in elevation. Its angular position was incremented by 15' 
111 both diieciions. Example images with diiferem pose and 
tHuminaiion settings are shown in Rg, 2. 
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Figure 2: Examples of synthetic faces from the training set • 
Upper row shows the variations in pose and ihe bottom row 
shows djfferem illumiDalions. 

•Ri build the cross-validation set we rendered the 3D head 
niodels for digbily different viewpoints and illumination 
conditions. ITie faces were rotated in depth from 1" to 45" 
in 2° steps. The position of the directional light source var- 

""^ ^^-S" » ""d between 

-22.5 and 67.5» in elevation. Its angular position was in- 
cremeaied by 30- in both directions. In addition, each face 
was tilted by ±10° and rotated in the image plane by ±50 
TTie validation set included about 18,200 face images, some 
examples are dwwn in Rg. 3. "s^sauwe 



^; ^TP''^ syndetic faces from the validation 
set. TUe facesm the cross-validation set were tilted by ilO° 
and rotated in the im^^ plane by db5°. 



3. Learning Components 

An intuiiix.: choia- of components for face recognition 
would be the eyes, the nose and the mouth. Howwcr it 
IS not clear what exactly the size and .shape of these com- 
ponenis should be and whether there are other conuioncois 
which are equally importan. for recognition. Furthcmiorc. 
we would like to quantify the di.<criminatory power of each 
component and analyze how the optimal set of components 
clianges over po.se and across different .subjects. This cmi 
be accomplished by an algorithm for learning coniponents 
which was developed in the context of face detection PI 
TTie algorithm stiuis with a small rectangular componeni 
located around a preselected point in the face (e.g cen- 
ter of the left eye) TTiecomponem is extracted from each 
face irnagp to binld a training set. A componem classifier is 
trained according to the one-v.s-an strategy, 5.e. the compo- 
nents of one person are uained against the componentsof^dl 
other people m the database We then estimate the predic- 
Uon error of each component classifier by cross-validation 
To do so we extract the components from all images in the 
validation set based on the known locations of the refer- 
ence point.s. Analogous to the training data, the positive 
validation set includes the components of one peiwn and 
the negative set includes die components of all other peo- 
ple After we determined the recognition rate (CV rate) of 
the cnmml component classifier on the validation set we 
enlarge the component by expanding the rectangle by one 
pixel into one of four directions: op, down, iefi or right 
Again, wegenerate training data, train an SVM and deter- 
mine the CV rate Vfe do this for expansions into all four 
direcuons and finally keep the expansion for which the CV 
rate increases the most. This process can be repeated for a 
preselected number of expansion siep.s. We ran our experi. 
ments on fourteen componems, most of them located in the 
viamty of the eyes, nose and mouth (see fig. 4) 





roS fa^^ "'^ ^'"^ components for a fromai and 



4. Experiments 

TTie rotation in depth of the faces in the training and vali- 
dation sets ranged from 0° to 44°, with increments of 2°. 
We sphJ the data into three subsets: (0°, 14°J, (16° 30'J and 



|32', M'']. wliich in (he following arc referred lo lu! pose in- 
tervals 1.2. and X rcspeciively. Each subset included about 
630 iraining and 1060 validation images of each person. To 
speed-up compulation. \ve randomly removed iwo ihiwls of 
the images in the validation subsets, leaving 350 images in 
each validation subset. For each person and each po.se in- 
terval we imiiied a set of fourteen component cla.ssilicrs. 
resulting In 252 componcni classiners overall. 

In a first experiment we determined ihcCV rate depend- 
mg on the width and height of a symmetric componem 
Symmetric components are rectangular components which 
are centered on a reference point. i.e. their expansions lo 
the left and right are tdcmical as arc the expansions up and 
down. The dependency on only two variables allows a 3D 
visualization of the CV rate. VHe computed the CV rate for 
SVMs with iKcond-degreepolynomial temel for all sizes of 
a symmetric componem between 5 x 5 and 21 x 21 pixels 
As features we used the histogram-equalized gray values of 
the component paitem.s. Some resuhs are depicted in Fig 5 
The first four rows show the CV rates for (be component^ 
2 (nghl eye), 3 (center of the mouth), and 10 (right cheek) 
for pose interval 3 (32« to 44"). The last two rows show 
how the CV rate of component 3 changes across the pose 
The surfaces arc relatively .smootli with few local maxima 
It can be expected that gradient-based methods, such as the 
one described in the previous Section, can be successfully 
appUed to find the maxima. The diagranw also show that the 
CV rate for a given component strongly changes between 
subjects, indicating thai person-specific components yield 
better discrimination results than a universal set of com- 
ponents. Another important observation is that the results 
for a given component/subject combination vary over pose, 
which suggests the use of view-specific con^nents. 

hi a second expwiment we appUed the algorithm de- 
scnbcd in Stsction 3 to leam a set of rectangular compo- 
nents, Hje initial size of our 14 components was set to 9x9 
pixels. The number of iterations in the gtowing process 
was limited to 10, resulting in 10 components of different 
sizes and with differcnt CV rates. Of the 10 components 
we selected ibe one with the maximum CV rate as our final 
chmce. As for the previous experiment, we trained SVMs 
with a second-degree polynomial kernel on the histogram- 
equalized gray values of the components. Fjg. 6 shows 
the leained cmnponenis across different subjects and views 
TTie average intensity of the component encodes the CV 
rate, bright values indicate a high CV rate. The pictures 
.show that the CV rales decrease with increasing n)(ation 
niis effect IS more prominent for coo^wnents on the leA 
side of thp face-4be side to which the faces were roiated- 
than for components on the right side. For the given sys- 
tem, the optimal pose hiterval for recognizing faces is the 
near frontal interval, which is similar to results reported in 
psychophysical experimenuton face recQgniUoo in humans 




Figure 5: The cross-validation recognition rale for symmei- 
ric components. The first four rows show ihe CV rates for 
the components 2 (right eye), 3 (center of the mouth), and 
iO (right cheek) for pose interval 3: [Sa**, 44^]. The last 
two rows show how the CV rate changes across the pose 
intervals [0% 14% (16^30*^l and ^32^ 44% • 



3 



Title: Expectation Maximization of Prefiontal-superior 
Temporal Network by Indicator Component-based Approach 
Applicants: Takamasa Koshizen, et al. 
Docket No^ 20911-08046 




in 

C 
Q> 
c 
o 
a. 

E 

8 

0) 

JQ 

E 



m § 

i :g 5 a> <D 
A o to o 




g 

i 



1 




c 



m 
-£ 

c 
o 

o 
«= 

c 
2 

i< 
I 

H 

rs 



S 




Ttt\e: Expcctalion Maximization of Prcfronial-supcrior 

IobS, tT ^ 1""'?'°' Componcm^,ascd Approach 
Appheanis: Takamasa Koshizen, « al 
Docket No.: 20911-08046 





m I o CD 



Tilic: Expectation Maximizaiion of Prcfromal-supcrior 
Temporal Network by Indicator Component-based Approach 
Applicants: Taleamasa Koshizen, et al. 
Docket No.: 209II-0S046 




I O CO 

In 1 c o> 



3 
•D 
O 

E 

c 
c 

s 

c 
o 

(Q 
O 



c 
a> 

o 

o 

c 
o 

0 
% 



ft 




OS 
03 



lb 
<n 



> 



0} 



§1 = 



o 

c 
o 

O 

£ 



I 

ii 



c 



Q> 
flL 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 



t3 BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFER£NCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




